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1. Introduction 

In this paper we discuss the approximation of f{A)b, where A G C"^" is non-Hermitian and 
/ is a function defined on the spectrum of A such that the extension of / to matrix arguments is 
defined.^ 

The motivation for this rather general setting comes from quantum chromodynamics (QCD) 
formulated on a discrete space-time lattice, where / = sign is of special interest. As the main 
object relevant for our discussion we are focusing on the overlap Dirac operator [3, 4]. The main 
numerical effort lies in the inversion of the overlap operator, which is done by iterative methods and 
requires the repeated application of the sign function of the usual "symmetrized" Wilson operator 
fiw = /s^w (see [5] for the notation) on a vector. 

At zero quark chemical potential /i, the operator Hw is Hermitian. However, one can also study 
QCD at nonzero /i, which is relevant for many physical systems such as neutron stars, relativistic 
heavy ion collisions, or the physics of the early universe. The overlap operator has been generalized 
to this case [5,6]. The computational challenge is the fact that at non-zero chemical potential Hyf 
becomes non-Hermitian. 

This contribution is organized as follows. In Section 2 we review multishift methods which 
have proven to be successful in the Hermitian {pt = 0) case. We will point out the problems that 
occur when applying these methods to the non-Hermitian (/i ^ 0) case. In Sections 3 and 4 we 
present two procedures, restarts and deflation, which — especially when applied in combination — 
make multishift methods applicable to non-Hermitian matrices. We present our numerical results 
in Section 5, and conclusions are drawn in Section 6. 

2. Multishift methods 

First we recall some results for the Hermitian case, i.e., we investigate the computation of 
f{A)b, where A € C"'^" is Hermitian. If A is large, f{A) is too costly to compute, while f{A)b can 
still be obtained in an efficient manner if A is sparse. Krylov subspace methods, i.e., methods that 
approximate f{A)b in a Krylov subspace Kf^{A,b) =span{Zj,A^, . . . are suitable for this 

task. We distinguish between two Krylov subspace approaches: direct projection and multishift. 

Direct projection methods compute the sign function for the projection of A onto Ku{A,b) and 
lift the result back to the original space, see [1, 7], or [8, 9] in the context of QCD. These methods 
are not the topic of this paper, but we will use them for comparison in our numerical results. 

The idea of multishift methods is to approximate / by a rational function g, 

' m 

/W^gW = £— ^. (2.1) 

,•=1 ^ 

The systems 

(A - c7//);c('^ = , i=\,...,s {1.1) 

are treated with standard Krylov subspace methods such as the conjugate gradient method (CO) or 
the minimal residual method (MINRES), approximating x^'^ by Xk^''^ from a Krylov subspace. Since 

^The function / can be extended to matrix arguments by, e.g., a spectral definition or a contour integral. For a 
thorough treatment of matrix functions see [1]; a compact overview is given in [2]. 
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Krylov subspaces are shift invariant, i.e., Kk{A — OiI,b) = Kk{A,b), the approximations x^^'") can be 
computed simultaneously using the same subspace for all systems. The desired approximation is 
then obtained by combining the approximations to the s shifted systems 

f{A)b^x, = f^(OiXk^'K (2.3) 

!=1 

The core of any such method is the computation of an appropriate basis for the Krylov subspace. 
For Hermitian matrices an orthonormal basis can be built with short recurrences using the Lanczos 
process. These short recurrences are essential for the efficiency of the approach. 

Turning to non-Hermitian matrices, the computation of an orthogonal basis now requires long 
recurrences and is usually summarized via the Amoldi relation 

AVk = VkHk + h+i,kVk+iel . (2.4) 

Here, Vi = [vi | . . . jv^:] G C"^*^ is the matrix which contains the computed basis vectors (the Amoldi 
vectors), = Vt'AVk is the upper Hessenberg matrix containing the recurrence coefficients hij, 
and ejc denotes the k-th unit vector of C*^. 

For the rational approximation approach this means that the short-recurrence methods CG and 
MINRES have to be replaced by multishift versions of the corresponding long-recurrence methods, 
i.e., the full orthogonalization method (FOM) [10] and the generalized minimal residual method 
(GMRES) [1 1], respectively. 

Long recurrences slow down computation and increase storage requirements, and thus become 
inefficient or even infeasible if k, the dimension of the Krylov subspace, becomes large. In this 
paper we investigate restaits to circumvent this problem for non-Hermitian matrices. 

3. Restarts 

FOM to solve Ax = b consists of the Arnoldi process to compute the Arnoldi vectors vi,...,Vk 
as well as the upper Hessenberg matrix Hj^ = Vk^AV^ and of approximating x^xi^ = \\b\\2VkHi^^^ei. 
The Amoldi process applied to A — a,/ instead of A produces the same matrices with Hj^ replaced 
by the shifted counterpart — Gil. The k-th approximation to g{A)b, with g{x) defined in (2. 1), is 
thus given by 1 1 Z; 1 1 2 L 1 14 (Z/^: - a,-/) " ^ e i . 

To prevent recun^ences from becoming too long one can — in this case — use a restart proce- 
dure. This means that one stops the Arnoldi process after k^ax iterations. At this point we have a, 
possibly crude, approximation to g{A)b, and to allow for a restart one now has to express the error 
of this approximation anew as the action of a matrix function, g\{A)b\, say. 

A crucial observation concerning multishifts is that for any k the individual residuals r^^') = 
b — {A — OiI)xfS''> of the FOM iterates are just scalar multiples of the Arnoldi vector Vk+\, see, 
e.g., [10, 12], i.e., 

rk^^=Pk^^n+u / = !,..., 5 (3.1) 

with coUinearity factors p^-^'^ G C. The error A^; = g{A)b — x^ of the multishift approximation at 
step k can therefore be expressed as 

Ak = giiA)bi, where gi (0 = y'-^-^ and Zji =Vi:+i. (3.2) 
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This allows for a simple restart at step fc^ax of the Arnoldi process, with the new function gi again 
being rational with the same poles as g. This restart process can also be regarded as performing 
restarted FOM for each of the individual systems (A — OiI)x = b, i = I, . . . ,s (and combining the 
individual iterates appropriately), the point being that, even after a restart, we need only a single 
Krylov subspace for all s systems, see [10]. 

There also exists a restarted version of multishift GMRES, see [1 1] for a detailed derivation. 

4. Deflation 

In [8] two deflation approaches were proposed which use eigensystem information, namely 
Schur vectors (Schur deflation) or left and right eigenvectors (LR deflation) coiTcsponding to some 
"critical" eigenvalues. Critical eigenvalues are those which are close to a singularity of /. If they 
are not reflected very precisely in the Krylov subspace, we get a poor approximation. In case of 
the sign function the critical eigenvalues are those close to the imaginary axis. Here, we describe 
LR deflation (see [13] for the reason why this is the method of choice) and show how it can be 
combined with multishifts and restarts. 

Let R,„ = [ri I . . . |r,„] be the matrix containing the right eigenvectors and L,„^ = . . . |Z,„]^ 
the matrix containing the left eigenvectors corresponding to m critical eigenvalues of the matrix A. 
This means that we have 

AR,„=R,nA,„ and L„M = A,„lJ , (4.1) 

where A„, is a diagonal matrix containing the m critical eigenvalues. Since left and right eigen- 
vectors are biorthogonal, we can normalize them such that L„,^/?„, = /„,. The matrix P = R,nL,J 
represents an oblique projector onto the subspace = spanjri , . . . , r,„}. 
We now split f{A)b into the two parts 

f{A)b = f{A){Pb)+f{A){I-P)b. (4.2) 

Since we know the left and right eigenvectors which make up P, we directly obtain 

xp = f{A) {Pb) = f{A)R,„Lib = R,J{K,n){Llb) , (4.3) 

which can be computed exactly. The remaining part f{A){I — P)b can then be approximated it- 
eratively by using a multishift method. Thus f{A)b is now approximated in augmented Krylov 
subspaces Q.r + K]^{A,{1 -P)b), 

s 

Xk= xp + (^i^k ■ (4-4) 

e£2« e^:,(A,(/-P)fo) 

Theoretically, we have 

Kk{A, (/ - P)b) = (/ - P)Kk{A, (/ - P)b) C range(/ - P) , (4.5) 
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see [13]. In computational practice, however, components outside of range(/ — P) will show up 
gradually when building ^^(A, (/ — P)b) due to rounding effects in floating-point arithmetic. It is 
thus necessary to reapply I — P from time to time in order to eliminate these components. 

Since the only effect of LR deflation is the replacement of Z; by (/ — P)b, no modifications of 
the restart algorithm are necessary. 

5. Algorithms 

We combine multishift methods with restarts and deflation. We assume that the original func- 
tion / is replaced by a rational function (given by the shifts a,- and weights (oi) which approximates 
the original function sufficiently well after deflation. 

Depending on the underlying multishift method (FOM or GMRES), we get LR-deflated mul- 
tishift FOM (FOM-LR) or LR-deflated multishift GMRES (GMRES-LR). Algorithm 1 gives an 
algorithmic description of FOM-LR. (For an algorithmic description of GMRES-LR we refer to 
[13].) The notation FOM-LR(m,/:) indicates that we LR-deflate a subspace of dimension m and 
that we restart FOM after a cycle of k iterations. The vector x is the approximation to f{A)b. After 
the completion of each cycle we perform a projection step to eliminate numerical contamination 
by components outside of range (/ — P). 

Algorithm 1. Restarted FOM-LR(m,^) 

{Input m, k = ^max, A, {ai, . . . , aj, {wi, . . . , wj, b,L = L„,R = R^, A = A^} 

x = xp= Rf{A)L^b 

r = {I-P)b 

pi') = 1, / = 1,...,^ 

while not all systems are converged do [loop over restart cycles} 

j8 = lkll2 

VI = r/j8 

compute Vk, Hj^ by running k steps of Amoldi with A 

yf =^p^\Hk-Oih)-'eui = l,....s 

x = x + VkZU(Oiyf 

r = Vk+i 

p(') = -hk+i,kelyk\i = l,...,s 
r = {I — P)r {projection step} 
end while 



Note that a combination of deflation and a multishift method based on the two-sided Lanczos 
algorithm is also possible, see [13]. Of course, since two-sided Lanczos already gives short recur- 
rences, there is no need to restart here. 

6. Numerical results 

For our numerical experiments we turn to / = sign. In the Hermitian case, the sign function of 
A can be approximated using the Zolotarev best rational approximation, see [14] and, e.g., [15, 16]. 
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Using the Zolotarev approximation on non-Hermitian matrices gives rather poor resuUs, unless 
all eigenvalues are close to the real axis. A better choice for generic non-Hermitian matrices is 
the rational approximation originally suggested by Kenney and Laub [17] and used by Neuberger 
[18, 19] for vanishing chemical potential, 

(f + l)2.v_C^_;^\2.v 

sign(0 « , where g,{t) = ^ ^^2. ^ _ i)2.v ' (6. 1) 

The partial fraction expansion of gs is known to be 

see [17, 18]. Note that actually one uses g{ct), where the parameter c > is chosen to minimize 
the number of poles s needed to achieve a given accuracy. If the spectrum of A is known to be 
contained in the union of two circles C(m, r) U C{—m, r), where C{m, r) is the circle { |z — m| < r} 
and m and r are real with < r < m, then c = {{m + r){m — r))^^^^ is optimal, see [13, 16]. 

Figure 1 shows the performance of FOM-LR in comparison to the direct projection method. 
The ^-th approximation in the latter is given as 

||(/-P)^||2y;tsign(//^)ei , (6.3) 

where sign(//^) is computed via Roberts' method, see [1]. The relative performance of the two 
approaches depends on the parameters of the problem, such as the lattice size, the deflation gap, 
and the size of the Kiylov subspace. For more details, see [13]. We add that in the meantime an 
improved method to compute sign(//^) in the direct approach has been developed, see [20] in these 
proceedings. 




20 40 60 80 100 120 140 200 400 600 800 1000 



CPU time CPU time 

Figure 1: Comparison of the accuracy of the restarted FOM-LR algorithm (rFOM) and the direct two-sided 
Lanczos-LR method (2sL) as a function of the CPU time in seconds for an 8^ (left) and a 10^ (right) lattice 
configuration, using jj, = 0.3 in both cases. Each plot shows data for two different deflation gaps, given in 
parentheses. The restart size used in the restarted FOM-LR algorithm is ^max = 30 for the 8"* lattice and 
^n,ax = 40 for the 10"* lattice. 
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Figure 2 is meant to convey a warning. It shows that the projection step after each restart, as 
formulated in Algorithm 1, may be crucial to ensure convergence. In both plots we give results for 
Algorithm 1 and a variant thereof in which the projection step is omitted. The right plot shows that 
this may destroy convergence, the left plot shows that this is not necessarily so. Since the CPU 
time is increased only mai^ginally by the projection step, the latter should always be included. 
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Figure 2: Error vs CPU time for the FOM-LR algorithm with and without re-orthogonalization for 4^ and 
6^ (left) as well as 8^ and lO"* (right) lattices. We again used jU — 0.3 in all cases. 



7. Conclusion 

We have presented an algorithm, FOM-LR, to approximate the action of the sign function of a 
non-Hermitian matrix on a vector. This algorithm combines LR deflation and a rational approxima- 
tion to the sign function, which is computed by a restarted multishift method. The latter has fixed 
storage requirements determined by the restart parameter (maximum size of the Krylov subspace) 
and the degree of the rational approximation. Occasionally, additional projections of the Krylov 
vectors ai^e necessary for numerical stability. 

Whether FOM-LR or a direct method (i.e., the two-sided Lanczos-LR method) performs bet- 
ter depends on many details of the problem. Some of them have been mentioned in Section 6. 
Others include implementation issues such as optimized linear algebra libraries, and ultimately 
parallelization. 
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